PsychMods

TODO shorten lecture grp levels

Codes

  • Notes:
    1. AY2017/2018 Semester 2 and AY2018/2019 Semester 2 bidding data not available.
    2. The bidding statistics are highly non-normal due to being bounded by zero (they cannot make negative bids or have negative bidders). May consider using zero-inflated or poisson regression if considering these statistics as dependent variables.

Phase 1: Setting Up Environment, Packages And Loading Data

Load myBid.RDS

  • Downloading the data from the API using the code above takes a substantial amount of time.
  • I saved the downloaded data in myBid.RDS and load it directly from my local folder while I worked on the project.

Load Module Information

  • Module information was scattered across different folders.
  • Used a loop to repeat the process of downloading and converting to dataframe across the different folders accessed by the different URLs.
    • The same concept was used to consolidate information about the Module Titles.
myModInfo <- data.frame() # create empty dataframe which will act as a container to be populated with data
for(year in c(2011:2018)) # looping through each year
{
  for(semester in c(1,2))
  {
    # create the url where data is to be extracted from
    myurl <- paste0("https://api.nusmods.com/", year, "-", year + 1, "/", semester, "/moduleTimetableDeltaRaw.json")
    myjson <- fromJSON(file = url(myurl))
    for(r in 1:length(myjson)) # for each element in the myjson list, append it to myModInfo
    {
      if(isTRUE(str_detect(myjson[[r]]$ModuleCode, "^PL"))) # only keep info if module code begins with PL
      {
        if(myjson[[r]]$Semester == 1 | myjson[[r]]$Semester == 2) # only get semester 1 and 2 information
        {
          myModInfo <- rbind(myModInfo, myjson[[r]]) # add to dataframe
        }
      }
      myjson[[r]] <- NA # replace the element with NA to free up some rAM
    }
    cat(year, "Semester", semester, "Done!") # progress tracker
  }
}

myTitles <- data.frame() # create empty dataframe which will act as a container to be populated with data
for(year in c(2014:2018)) # looping through each year
{
    myurl <- paste0("https://api.nusmods.com/", year, "-", year + 1, "/moduleList.json") # create the url where data is to be extracted from
    myjson <- fromJSON(file = url(myurl))
    for(r in 1:length(myjson)) # for each element in the myjson list, append it to myModInfo
    {
      if(isTRUE(str_detect(myjson[[r]]$ModuleCode, "^PL"))) # only keep info if module code begins with PL
      {
        if(paste0(myjson[[r]]$Semester, collapse = "|") == "1"|
           paste0(myjson[[r]]$Semester, collapse = "|") == "2"|
           paste0(myjson[[r]]$Semester, collapse = "|") == "1|2") # only keep information from semester 1 and 2
        {
          myTitles <- rbind(myTitles, as.data.frame(myjson[[r]])) # add to dataframe
        }
      }
      myjson[[r]] <- NA # free RAM
    }
}

myModInfo <- myTitles %>% # add titles information to myModInfo
  select(ModuleCode, ModuleTitle) %>% # select these two columns
  filter(ModuleTitle != "Lab in Applied Psychology") %>%
  distinct() %>% # remove duplicates
  right_join(myModInfo, by = "ModuleCode") # left = myTitles, right = myModInfo

saveRDS(myModInfo, file = "myModInfo.RDS") # save to directory

Load myModInfo.RDS

  • Downloading the data from the API using the code above takes a substantial amount of time.
  • I saved the downloaded data in myModInfo.RDS and load the data directly while I worked on the project.

Phase 2: Filter, Transform And Merge

Module Information

  • Filter information from the dataframe myModInfo.
    • Removing non-Psychology modules.
    • Removing modules without module titles, these are modules that appeared before AY2014/2015 and never resurfaced afterwards.
    • Removing information about tutorials.
Filter

Bidding Information

  • Filter information from the dataframe myBid.
    • Removing non-Psychology modules, including Roots and Wings (prefixed with PLS-) and Psychology for non-Psychology students (prefixed with PLB-).
    • Removing information from quotas that are reserved and not available for bidding.
    • Removing information from modules with more than one lecture/seminar session.
    • Removing bidding information from non-psychology students.
  • Create new variable ClassNo by transforming from Group such that this information can be used to merge with myModInfo.
Filter & Transform

Merge

  • Combine the information of myModInfo and myBid.

Phase 3: Data Wrangling

  • The variables available in the original data are useful but they are too specific to interpret meaningfully.
  • This section creates new variables based on the original data and allow us to better discern any trend in the data.
  • Also includes additional wrangling and manipulations to ease the plotting of graphs and analysis later.

Phase 4: Data Diagnostics

  • Plot univariate histograms and bivariate plots using loops for almost every combination of variables.
  • The graphs from this section are predominantly for diagnostics rather than exploration, what I mean is that the graphs from this section would make little sense if one tried to draw insights from them. This is because they are aggregated across all other variables.
    • For example: The mean of Bidders is calculated across all academic years, all bidding rounds, all modules…
  • What I am looking out for in this section are odd patterns, like zeroes in places where they shouldn’t be, missing data, highly non-normal data, variables with outliers, etc…

Univariate Descriptive Statistics

## 'data.frame':    1934 obs. of  20 variables:
##  $ AcadYear           : Factor w/ 8 levels "2011/2012","2012/2013",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Semester           : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Round              : Factor w/ 7 levels "1A","1B","1C",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ModuleCode         : Factor w/ 87 levels "PL1101E","PL2131",..: 1 1 2 2 3 3 4 4 5 5 ...
##  $ Group              : Factor w/ 4 levels "LEC1","LEC2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Quota              : num  95 430 5 12 35 35 28 50 25 22 ...
##  $ Bidders            : num  10 100 3 42 8 3 7 2 8 5 ...
##  $ LowestBid          : num  1 1 1 205 1 1 1 1 1 1 ...
##  $ LowestSuccessfulBid: num  1 1 1 977 1 1 1 1 1 1 ...
##  $ HighestBid         : num  500 1150 368 1255 500 ...
##  $ StudentAcctType    : Factor w/ 4 levels "New[P]","NUS[P]",..: 3 1 3 1 3 1 3 1 3 1 ...
##  $ Group1             : chr  "LECTURE 1" "LECTURE 1" "LECTURE 1" "LECTURE 1" ...
##  $ ClassNo            : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ModuleTitle        : Factor w/ 85 levels "Abnormal Psychology",..: 34 34 74 74 75 75 8 8 13 13 ...
##  $ DayText            : Factor w/ 5 levels "Monday","Tuesday",..: 1 1 3 3 2 2 2 2 3 3 ...
##  $ StartTime          : num  1800 1800 1600 1600 800 800 1200 1200 1400 1400 ...
##  $ Level              : Factor w/ 4 levels "Level 1","Level 2",..: 1 1 2 2 2 2 3 3 3 3 ...
##  $ BidPerQuota        : num  0.105 0.233 0.6 3.5 0.229 ...
##  $ Period             : Factor w/ 2 levels "Morning",">=Afternoon": 2 2 2 2 1 1 2 2 2 2 ...
##  $ Category           : chr  "Core" "Core" "Core" "Core" ...
## Warning in describe(mydata): NAs introduced by coercion

## Warning in describe(mydata): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

Bivariate Plots

  • Plots to illustrate pairwise relationships amongst variables.

Phase 5: Answering Questions

Do less people bid for a module if the lecture begins in the morning (before 12pm)?

Lets look at each module and compare the average number of bidders, bidders per quota and lowest successful bids when the lecture begins in and after the morning.

Bonus: Multilevel Modeling
Peek Data

Do results from previous rounds…

Post

Module Biddiing

My favourite part of university education was the ability to pick and choose modules. Excluding the compulsory modules, we were frequently spoiled for choice when it came to the electives. But this freedom came at a cost, we had to bid for the modules instead of simply being assigned them. The bidding system (CORS) was created to cope with the reality that certain modules were in higher demand, yet the modules had limited capacity. Students had to carefully ration their limited bid points, which were used to win auctions for desired modules. Do you go all-in on an extremely popular module and be stuck with no points to bid for the remaining modules? Or spread the risk and bid moderately on multiple modules that align with your interest?


Inadvertently, we began to observe, hypothesize and act on certain trends to guide and maximize our bidding choices. We might even share advices based on these trends. For example:


Modules belonging to the domain of Clinical Psychology are the most popular, so you need to plan ahead and stockpile points from previous semesters if you plan on bidding for them.

This advice came from numerous seniors and I even shared it with my juniors. It became more convincing after observing peers grief over their inability to secure a place in Introduction to Counselling Psychology or Psychological Therapies due to the exhorbitant amount of points required (which required students to stockpile points from previous semesters). But I have never heard anyone claiming that they really wanted to study Cognitive Neuroscience but failed to bid for it.

But was it true that Clinical Psychology modules were the most popular? Rather than inferring trends from personal anecdotes and observation, do we have data to support this claim? The answer is yes! Bidding statistics and other module information are available at https://nusmods.com/api/. All thanks to the team at NUSMods who created a great timetabling tool for all NUS students.

The information was downloaded, extracted, transform, analysis and visualized using R. The codes are available under Codes tab above. The API contains extracted data for all modules from different majors and faculty but I will focus only on Psychology modules in this post as I have greater familiarity with them.


Module Categories

For the typical Psychology major, there are broad four categories of modules.

Categories Description
Core Modules that are required for all undergraduates. Include PL1101E, PL2131, PL2132, PL3232 to PL3236.
Level 3 Electives Modules that are outside of the core modules. Between four to six of these are required by all undergraduates to graduate. Their module codes run from PL3237 to PL3260.
Level 3 Lab Modules Lab modules are structured as individual or group research projects in a specific domain of Psychology. Every undergraduate is required to complete at least one of these modules. Their module codes are prefixed with PL328x.
Level 4 Honors Modules Modules that are required to graduate on the Honors track, usually taken near the end of the undergraduate degree. Between three to eight of these are required to graduate. They are prefixed with PL4xxx.

Within each category, were Clinical Psychology modules most popular?

Measures Of Popularity

Luckily, the bidding data contains potential indicators of popularity. These are the key bidding statistics/variables which will be used to compare popularity:

  1. Quota
    • The maximum number of students allowed in the module.
  2. Bidders
    • The number of students who placed a bid on the module.
  3. Bidders Per Quota (BpQ)
    • The number of bidders for each available quota, \(BpQ = \frac{Bidders}{Quota}.\)
  4. Lowest Successful Bid (LSB)
    • The lowest bid that is allocated the module, students who bidded below this value will not be allocated the module.

The bar graphs below illustrates the mean Quota, Bidders, BpQ and LSB of each module category, calculated across all modules, semesters and rounds. The different categories vary greatly, and these differences makes it difficult to meaningfully compare popularity across categories.

We define a popular module possessing the following characteristic in Round 1A (the first round of bidding):

  1. Maximum Quota available.
    • Some background on the bidding system: Round 1A is officially the first round but there is a Module Preference Exercise before Round 1A. In this exercise, all students declare the modules that they wish to study for the coming semester.
    • When the total number of students that wish to study a particular module is less than the quota (demand < supply), these students will be allocated the module for free. The unfilled quota will be up for bidding in Round 1A.
    • If the number of interested students exceed the quota (demand > supply), no students will be allocated the module and all quotas will be up for bidding. Popular modules are expected to fall into this scenario, thus their quota in Round 1A should be at a maximum.
  2. Number of Bidders exceed the Quota.
  3. High LSB.
  4. High BpQ.

Modules that do not fit criteria 1. and 2. will not be considered popular. Amongst these modules, 3. and 4. will be used to determine which modules were most popular.

Level 4 Honors Modules

The bar plot displays the mean LSB of level 4 modules in Round 1A, averaged across all academic year, semesters, lecture slots (for modules with multiple lecture slots) and account types. Only modules with a median available Quota of 40 and above (1.) and BpQ more than 1 (2.) are displayed. Hover over the respective bars to view other statistics such as the mean/median number of Bidders, Quota, BpQ and LSB.

Level 3 Lab Modules

Elective Modules

I selected three more trends to investigate.

There used to be a time when Clinical Psychology modules were not popular.

I mentioned that there was a perceived trend that Clinical Psychology modules were really popular, but I recently met a few slightly more senior seniors in the workforce (who graduated almost a decade ago). They told me, much to my surprise, that Clinical Psychology was pretty unpopular back in their days. Was this supported by the data?

A module is less popular when the lecture is early in the day.

Lessons at 8am were painful, I live somewhere in the North-East so I would have to wake up somewhere around 6.15am to arrive comfortably for 8am lessons (including time taken for the shuttle bus in NUS). I generalized from my personal opinion to form this myth, that 8am lessons would be less popular compared to their later counterparts. My friends generally shared the same sentiments, but was it reflected in the bidding statistics? Or are we projecting our laziness to other students?

Module bidding became more competitive in later cohorts, you should bid some amount higher than previous winning bids to account for the inflation in points caused by the increased competitiveness.

Almost everyone checks the bidding statistics from previous iterations of the module to estimate the amount of points to bid. Of course, there should be a positive correlation between past and future bidding statistics but would it be recommended to anchor your bids onto the lowest successful bids from the past? If so, how much higher or lower should we bid compared to previous iterations? There are a few hypothesized reasons for the increased competitiveness, such as the cohort expanding but module quotas remanining constant. Or the tendency for students to always bid some points higher than previous winning bids, which leads to this upward cycle of bid point inflation. Does bid point inflation really exist?